Robust distances for outlier-free goodness-of-fit testing

نویسندگان

  • Andrea Cerioli
  • Alessio Farcomeni
  • Marco Riani
چکیده

Robust distances are mainly used for the purpose of detecting multivariate outliers. The precise definition of cut-off values for formal outlier testing assumes that the “good” part of the data comes from a multivariate normal population. Robust distances also provide valuable information on the units not declared to be outliers and, under mild regularity conditions, they can be used to test the postulated hypothesis of multivariate normality of the uncontaminated data. This approach is not influenced by nasty outliers and thus provides a robust alternative to classical tests for multivariate normality relying on Mahalanobis distances. One major advantage of the suggested procedure is that it takes into account the effect induced by trimming of outliers in several ways. First, it is shown that stochastic trimming is an important ingredient for the purpose of obtaining a reliable estimate of the number of “good” observations. Second, trimming must be allowed for in the empirical distribution of the robust distances when comparing them to their nominal distribution. Finally, alternative trimming rules can be exploited by controlling alternative error rates, such as the False Discovery Rate. Numerical evidence based of simulated and real data shows that the proposed method performs well in a variety of situations of practical interest. It is thus a valuable companion to the existing outlier detection tools for the robust analysis of complex multivariate data structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Updated Review of Goodness of Fit Tests Based on Entropy

Different approaches to goodness of fit (GOF) testing are proposed. This survey intends to present the developments on Goodness of Fit based on entropy during the last 50 years, from the very first origins until the most recent advances for different data and models. Goodness of fit tests based on Shannon entropy was started by Vasicek in 1976 and were continued by many authors. In this paper, ...

متن کامل

An Alternative Robust Model for in situ Degradation Studies “Korkmaz-Uckardes”

The first purpose of this study is to present an alternative robust model in order to describe ruminal degradation kinetics of forages and to minimize the fitting problems. For this purpose, the Korkmaz-Uckardes (KU) model, which has a logarithmic structure, was developed. The second purpose of this study is to estimate, by using the Korkmaz-Uckardes (KU)model, the parameters tp (the time to pr...

متن کامل

Robust tests for testing the parameters of a normal population

This article aims to provide a simple robust method to test the parameters of a normal population by using the new diagnostic tool called the “Forward Search” (FS) method. The most commonly used procedures to test the mean and variance of a normal distribution are Student’s t test and Chi-square test, respectively. These tests suffer from the presence of outliers. We introduce the FS version of...

متن کامل

Outlier Detection in Survival Analysis

Outlier detection is an important task in many data-mining applications. In this paper, we present two parametric outlier detection methods for survival data. Both methods propose to perform outlier detection in a multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of goodness of fit. The first method is a single-step procedure that presents a de...

متن کامل

Testing the Exactitude of Estimation Methods in the Presence of Outliers: An accounting for Robust Kriging

Estimation of gold reserves and resources has been of interest to mining engineers and geologists for ages. The existence of outlier values shows the economic part of the deposits subject to the fact that don’t depend on the human or technical errors. The presence of these high values causes a pseudo dramatically increment in variance estimation of economical blocks when applying conventional m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 65  شماره 

صفحات  -

تاریخ انتشار 2013